Visualisation Documentation¶

Version: 1.0 (Jupytext, time measurements, logger)

Table of Content¶

  • Notebook Description
  • General Settings
    • Paths
    • Notebook Functionality and Appearance
    • External Libraries
    • Internal Code
    • Constants
  • Visualizations Examples
    • Histogram
    • Bar Chart
      • Bar Chart Sorted by Values
      • Bar Chart Sorted by IDs
    • Time Series Visualizations
      • Selectors
      • Time Series
        • One Time Series
        • Multiple Series
        • Filling
        • Anomalies
      • Time Series Events
        • Simple Plot
        • Full Plot
    • Multi Histogram - Distplot
    • Multi Histogram
    • Line Plot
  • Overall Customisation
    • Size of the Picture
  • Final Timestamp

Notebook Description¶

ToC

This notebook serves as a documentation notebook for visualisation code from src\visualisations.

GENERAL SETTINGS¶

ToC

General settings for the notebook (paths, python libraries, own code, notebook constants).

NOTE: All imports and constants for the notebook settings shoud be here. Nothing should be imported in the analysis section.

Paths¶

ToC

Adding paths that are necessary to import code from within the repository.

In [1]:
import sys
import os
sys.path+=[os.path.join(os.getcwd(), ".."), os.path.join(os.getcwd(), "../..")] # one and two up

Notebook Functionality and Appearance¶

ToC
Necessary libraries for notebook functionality:

  • A button for hiding/showing the code. By default it is deactivated and can be activated by setting CREATE_BUTTON constant to True.

    NOTE: This way, using the function, the button works only in active notebook. If the functionality needs to be preserved in html export, then the code has to be incluced directly into notebook.

  • Set notebook width to 100%.
  • Notebook data frame setting for better visibility.
  • Initial timestamp setting and logging the start of the execution.
In [2]:
from src.utils.notebook_support_functions import create_button, get_notebook_name
from src.utils.logger import Logger
from src.utils.envs import Envs
from src.utils.config import Config
from pandas import options
from IPython.display import display, HTML

Constants for overall behaviour.

In [3]:
LOGGER_CONFIG_NAME = "logger_file_console" # default
PYTHON_CONFIG_NAME = "python_personal" # default
CREATE_BUTTON = False
ADDAPT_WIDTH = True
NOTEBOOK_NAME = get_notebook_name()
In [4]:
options.display.max_rows = 500
options.display.max_columns = 500
envs = Envs()
envs.set_logger(LOGGER_CONFIG_NAME)
envs.set_config(PYTHON_CONFIG_NAME)
Logger().start_timer(f"NOTEBOOK; Notebook name: {NOTEBOOK_NAME}")
if CREATE_BUTTON:
    create_button()
if ADDAPT_WIDTH:
    display(HTML("<style>.container { width:100% !important; }</style>")) # notebook width
A: ../../configurations\logger_file_console.conf
2023-06-05 15:08:00,234 - git.util - DEBUG - Failed checking if running in CYGWIN due to: FileNotFoundError(2, 'The system cannot find the file specified', None, 2, None)
2023-06-05 15:08:00,263 - file_console - DEBUG - Logger was created on JIRI-A in branche 012_check_rerun_and_save_all_notebooks.
2023-06-05 15:08:00,267 - file_console - DEBUG - Process: NOTEBOOK; Notebook name: visualisation_documentation.py; Timer started;

External Libraries¶

ToC

In [5]:
from importlib import reload
from pandas import Series, options, DataFrame
from numpy.random import seed, normal, randint
from numpy import array, log
from numpy.random import randn
from datetime import datetime

Internal Code¶

ToC
Code, libraries, classes, functions from within the repository.

In [6]:
from src.visualisations.visualisation_functions import create_time_series

Constants¶

ToC
Constants for the notebook.

NOTE: Please use all letters upper.

General Constants¶

ToC

In [7]:
# from src.global_constants import *  # Remember to import only the constants in use
N_ROWS_TO_DISPLAY = 2
FIGURE_SIZE_SETTING = {"autosize": False, "width": 2200, "height": 750}
DATA_PROCESSING_CONFIG_NAME = "data_processing_basic"

Constants for Setting Automatic Run¶

ToC

In [ ]:
 

Notebook Specific Constants¶

ToC

In [ ]:
 

Visualizations Examples¶

ToC

Histogram¶

ToC

In [8]:
seed(185)

data = normal(size=500)
print(f"Type of the data is: {type(data)}")
print(f"Several observations: {data[0:10]}")

plot_title = "My Super Random Histogram"
x_title = "Awesome Random Data"
Type of the data is: <class 'numpy.ndarray'>
Several observations: [ 0.04471179 -1.09776817  1.63620698 -1.12098251 -0.93842551 -0.71691279
  0.77317168  0.31091126  0.67555112  1.00791911]

Plane Histogram¶

ToC

In [9]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()

histogram.plot(data=data, plot_title=plot_title, x_title=x_title)

Number of Bins¶

ToC

In [10]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()

histogram.plot(data=data, plot_title=plot_title, x_title=x_title, n_bins=4)

Setting the X-axis Range¶

ToC

In [11]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()

histogram.plot(data=data, plot_title=plot_title, x_title=x_title, n_bins=0, x_axis_min_max=(-6, 6))

Set Vertical Line¶

ToC

In [12]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()

histogram.plot(data=data, plot_title=plot_title, x_title=x_title, n_bins=0, x_axis_min_max=(-6, 6), 
               vertical_lines_positions=[-1.7, 0.4, 2.33])

Bar Chart¶

ToC

Bar Chart Sorted by Values¶

ToC

In [13]:
array_id = array(["Alpha", "Beta", "Gamma"])
array_values = array([10, 5, 30])
plot_title = "Super Bar Chart"
name_id = "Greek Alphabet"
name_values = "Occurance"

order_by_values = True
reverse = True
In [14]:
import src.visualisations.plotly_bar_chart as BAR

reload(BAR)

bar_chart = BAR.PlotlyBarChart()

bar_chart.plot(array_id, array_values, plot_title, name_id, name_values, order_by_values, reverse)

Bar Chart Sorted by IDs¶

ToC

In [15]:
array_id = array(["Alpha", "Beta", "Gamma"])
array_values = array([10, 5, 30])
plot_title = "Super Bar Chart"
name_id = "Greek Alphabet"
name_values = "Occurance"

order_by_values = False
reverse = False
In [16]:
import src.visualisations.plotly_bar_chart as BAR

reload(BAR)

bar_chart = BAR.PlotlyBarChart()

bar_chart.plot(array_id, array_values, plot_title, name_id, name_values, order_by_values, reverse)
In [17]:
array_id = array([3, 2, 4, 1])
array_values = array([10, 5, 7, 30])
plot_title = "Super Bar Chart"
name_id = "Greek Alphabet"
name_values = "Occurance"

order_by_values = False
In [18]:
import src.visualisations.plotly_bar_chart as BAR

reload(BAR)

bar_chart = BAR.PlotlyBarChart()

bar_chart.plot(array_id, array_values, plot_title, name_id, name_values, order_by_values)

Time Series Visualization¶

ToC

In [19]:
ts_1 = create_time_series(seed_number=11, x_multiplier=0.3)
ts_1_names = [f"Amazing Data {i}" for i in range(1, len(ts_1)+1)]
ts_2 = create_time_series(seed_number=32, x_multiplier=1.1)
ts_2_names = [f"Random Data {i}" for i in range(1, len(ts_2)+1)]
ts_3 = create_time_series(seed_number=64, x_multiplier=1.8)
ts_3_names = [f"Third TS {i}" for i in range(1, len(ts_3)+1)]

# ts_4 = create_time_series(seed_number=20, x_multiplier=0.7)
# ts_5 = create_time_series(seed_number=87, x_multiplier=0.1)
# ts_6 = create_time_series(seed_number=63, x_multiplier=2.5)

print(type(ts_1))
print(type(ts_1[0]))
print(type(ts_1.index))
print("\n")
print(ts_1.head())

plot_title = "Awesome Random Time Series Plot"
y_title = "Awesome Random Data"
<class 'pandas.core.series.Series'>
<class 'numpy.float64'>
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


2020-01-01    16.749455
2020-01-02    16.191528
2020-01-03    17.338647
2020-01-04    16.263316
2020-01-05    19.651911
dtype: float64

Selectors¶

ToC

In [20]:
import src.visualisations.plotly_time_series as TS

reload(TS)

ts_vizu_selectors = TS.PlotlyTimeSeries()

ts_vizu_selectors.set_selectors(selectors=[
    {"count": 3, "label": "3d", "step": "day", "stepmode": "backward"},
    {"count": 7, "label": "1w", "step": "day", "stepmode": "backward"},
    {"step": "all"}
])

ts_vizu_selectors.plot(
    series=[ts_1], 
    anomalies=[2, 10],
    plot_title=plot_title, 
    y_title=y_title
)

Time Series¶

ToC

One Time Series¶

ToC

In [21]:
import src.visualisations.plotly_time_series as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeries()

ts_visu.plot(
    series=[ts_1],
    plot_title=plot_title, 
    y_title=y_title
)

Multiple Series¶

ToC

In [22]:
import src.visualisations.plotly_time_series as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeries()

ts_visu.plot(
    series=[ts_1, ts_2, ts_3],  
    series_names=["Random Sinus", "Another Random Sinus", "Wooow"],
    series_obs_names=[ts_1_names, ts_2_names, ts_3_names],
    plot_title=plot_title, 
    y_title=y_title
)

Filling¶

ToC

In [23]:
import src.visualisations.plotly_time_series as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeries()

ts_visu.plot(
    series=[ts_1, ts_1+1, ts_1-2], 
    plot_title=plot_title, 
    y_title=y_title, 
    fill_areas=True
)

Anomalies¶

ToC

In [24]:
import src.visualisations.plotly_time_series as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeries()

ts_visu.plot(
    series=[ts_1, ts_2],  
    series_names=["Random Sinus", "Another Random Sinus"],
    series_obs_names=[ts_1_names, ts_2_names],
    anomalies=[2, 10],
    anomalies_obs_names=["False", "Perfect"],
    plot_title=plot_title, 
    y_title=y_title
)

Time Series Events¶

ToC

In [25]:
ts_1 = create_time_series(seed_number=11, x_multiplier=0.3)
ts_1_names = [f"Amazing Data {i}" for i in range(1, len(ts_1)+1)]
ts_2 = create_time_series(seed_number=32, x_multiplier=1.1)
ts_2_names = [f"Random Data {i}" for i in range(1, len(ts_2)+1)]

buys = ts_1.iloc[[1, 9, 10, 12, 13, 14],]
buy_names = [f"Buy {i+1}" for i, _ in enumerate(buys)]
sells = ts_1.iloc[[4, 17, 20, 22, 26],]
sell_names = [f"Sell {i+1}" for i, _ in enumerate(sells)]

print(type(ts_1))
print(type(ts_1[0]))
print(type(ts_1.index))
print("\n")
print(ts_1.head())

plot_title = "Awesome Random Time Series Plot"
y_title = "Awesome Random Data"
<class 'pandas.core.series.Series'>
<class 'numpy.float64'>
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


2020-01-01    16.749455
2020-01-02    16.191528
2020-01-03    17.338647
2020-01-04    16.263316
2020-01-05    19.651911
dtype: float64

Simple Plot¶

ToC

In [26]:
import src.visualisations.plotly_time_series_events as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeriesEvents()

ts_visu.plot(
    series=[ts_1],
    events=[buys, sells],
    plot_title=plot_title, 
    y_title=y_title
)

Full Plot¶

ToC

In [27]:
import src.visualisations.plotly_time_series_events as TS

reload(TS)

ts_visu = TS.PlotlyTimeSeriesEvents()

ts_visu.plot(
    series=[ts_1, ts_2],
    series_names=["Wonder Series", "Normal Series"],
    series_obs_names=[ts_1_names, ts_2_names],
    events=[buys, sells],
    event_names=["Buys", "Sells"],
    event_obs_names=[buy_names, sell_names],
    plot_title=plot_title, 
    y_title=y_title
)

Multi Histogram - Distplot¶

ToC

The visualization normalizes the histograms, so if the number of observations is different, it does not work.

In [28]:
n = 100

x1 = randn(n)-2
x2 = randn(n+20)
x3 = randn(n+100)+2
x4 = randn(n+200)+4
x5 = randn(n+200)+6
x6 = randn(n+200)+8

data = [x1, x2, x3, x4, x5, x6]

group_labels = ["My Group 1", "My Group 2", "My Group 3", "My Group 4", "My Group 5", "My Group 6"]
bin_size = [0.1, 0.25, 0.5, 1, 0.125, 0.75]

plot_title = "My Super Random Histogram"
x_title = "Awesome Random Data"

Basic Distplot¶

ToC

In [29]:
import src.visualisations.plotly_histogram_distplot as HIST_DIST

reload(HIST_DIST)

histogram_distplot = HIST_DIST.PlotlyHistogramDistplot()

histogram_distplot.plot(data=data, plot_title=plot_title, x_title=x_title, group_labels=group_labels, bin_size=bin_size,
                     x_axis_min_max=None, vertical_lines_positions=None, dashboard=False)

Displot with Range¶

ToC

In [30]:
import src.visualisations.plotly_histogram_distplot as HIST_DIST

reload(HIST_DIST)

histogram_distplot = HIST_DIST.PlotlyHistogramDistplot()

histogram_distplot.plot(data=data, plot_title=plot_title, x_title=x_title, group_labels=group_labels, bin_size=bin_size,
                     x_axis_min_max=[0, 4], vertical_lines_positions=None, dashboard=False)

Distplot with Lines¶

ToC

In [31]:
import src.visualisations.plotly_histogram_distplot as HIST_DIST

reload(HIST_DIST)

histogram_distplot = HIST_DIST.PlotlyHistogramDistplot()

histogram_distplot.plot(data=data, plot_title=plot_title, x_title=x_title, group_labels=group_labels, bin_size=bin_size,
                     x_axis_min_max=None, vertical_lines_positions=[-1.7, 0.75, 6.74], dashboard=False)

Multi Histogram¶

ToC

In [32]:
n = 100

x1 = randn(n)-2
x2 = randn(n+20)
x3 = randn(n+100)+2
x4 = randn(n+200)+4
x5 = randn(n+200)+6
x6 = randn(n+200)+8

data = [x1, x2, x3, x4, x5, x6]

group_labels = ["My Group 1", "My Group 2", "My Group 3", "My Group 4", "My Group 5", "My Group 6"]

plot_title = "My Super Random Histogram"
x_title = "Awesome Random Data"

Basic Multi Histogram¶

ToC

In [33]:
import src.visualisations.plotly_histogram_multi as HIST_MULTI

reload(HIST_MULTI)

histogram_multi = HIST_MULTI.PlotlyHistogramMulti()

histogram_multi.plot(data=data, plot_title=plot_title, x_title=x_title, group_labels=group_labels)

Full Version¶

ToC

In [34]:
import src.visualisations.plotly_histogram_multi as HIST_MULTI

reload(HIST_MULTI)

histogram_multi = HIST_MULTI.PlotlyHistogramMulti()

histogram_multi.plot(data=data, plot_title=plot_title, x_title=x_title, group_labels=group_labels, 
                    x_axis_min_max=(-7, 15), vertical_lines_positions=[-2, 0.5, 5.8])

Line Plot¶

ToC

In [35]:
x_list = list(range(1, 26))
x = array(x_list)
y_1 = array([0.5 * x for x in x_list])
y_2 = array([log(x) for x in x_list])
x_short = array(list(range(5, 16)))
y_short = x_short * 2
In [36]:
line_names = ["Linear", "Logarithm", "Shorter Line"]
plot_title = "Line Chart"
x_title = "X Values"
y_title = "Y Values"

Basic Version¶

ToC

In [37]:
import src.visualisations.plotly_line_chart as LINE

reload(LINE)

line_chart = LINE.PlotlyLineChart()

line_chart.plot(lines=[(x, y_1), (x, y_2)], plot_title=plot_title, x_title=x_title, y_title=y_title)

Full Version¶

ToC

In [38]:
import src.visualisations.plotly_line_chart as LINE

reload(LINE)

line_chart = LINE.PlotlyLineChart()

line_chart.plot(lines=[(x, y_1), (x, y_2), (x_short, y_short)], line_names=line_names, plot_title=plot_title, x_title=x_title, y_title=y_title)

Overall Customisation¶

ToC

Size of the Picture¶

ToC

In [39]:
seed(185)

data = normal(size=500)
print(f"Type of the data is: {type(data)}")
print(f"Several observations: {data[0:10]}")

plot_title = "My Super Random Histogram"
x_title = "Awesome Random Data"
Type of the data is: <class 'numpy.ndarray'>
Several observations: [ 0.04471179 -1.09776817  1.63620698 -1.12098251 -0.93842551 -0.71691279
  0.77317168  0.31091126  0.67555112  1.00791911]
In [40]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()

histogram.plot(data=data, plot_title=plot_title, x_title=x_title)
In [41]:
import src.visualisations.plotly_histogram as HIST

reload(HIST)

histogram = HIST.PlotlyHistogram()
histogram.customize_size(autosize=False, width=2000, height=1500)

histogram.plot(data=data, plot_title=plot_title, x_title=x_title)

Final Timestamp¶

ToC

In [42]:
Logger().end_timer()
2023-06-05 15:42:04,495 - file_console - DEBUG - Process: NOTEBOOK; Notebook name: visualisation_documentation.py; Timer ended; Process Duration [s]: 2044.23; Process Duration [m]: 34.07